175 research outputs found
Optimal decentralized control of coupled subsystems with control sharing
Subsystems that are coupled due to dynamics and costs arise naturally in
various communication applications. In many such applications the control
actions are shared between different control stations giving rise to a
\emph{control sharing} information structure. Previous studies of
control-sharing have concentrated on the linear quadratic Gaussian setup and a
solution approach tailored to continuous valued control actions. In this paper
a three step solution approach for finite valued control actions is presented.
In the first step, a person-by-person approach is used to identify redundant
data or a sufficient statistic for local information at each control station.
In the second step, the common-information based approach of Nayyar et al.\
(2011) is used to find a sufficient statistic for the common information shared
between all control stations and to obtain a dynamic programming decomposition.
In the third step, the specifics of the model are used to simplify the
sufficient statistic and the dynamic program. As an example, an exact solution
of a two-user multiple access broadcast system is presented.Comment: Submitted to IEEE Transactions on Automatic Contro
A Decision Theoretic Framework for Real-Time Communication
We consider a communication system in which the outputs of a Markov source
are encoded and decoded in \emph{real-time} by a finite memory receiver, and
the distortion measure does not tolerate delays. The objective is to choose
designs, i.e. real-time encoding, decoding and memory update strategies that
minimize a total expected distortion measure. This is a dynamic team problem
with non-classical information structure [Witsenhausen:1971]. We use the
structural results of [Teneketzis:2004] to develop a sequential decomposition
for the finite and infinite horizon problems. Thus, we obtain a systematic
methodology for the determination of jointly optimal encoding decoding and
memory update strategies for real-time point-to-point communication systems.Comment: 10 pages, 1 figure: Forty Third Allerton Conference of Control
Communication and Computin
Sufficient statistics for linear control strategies in decentralized systems with partial history sharing
In decentralized control systems with linear dynamics, quadratic cost, and
Gaussian disturbance (also called decentralized LQG systems) linear control
strategies are not always optimal. Nonetheless, linear control strategies are
appealing due to analytic and implementation simplicity. In this paper, we
investigate decentralized LQG systems with partial history sharing information
structure and identify finite dimensional sufficient statistics for such
systems. Unlike prior work on decentralized LQG systems, we do not assume
partially nestedness or quadratic invariance. Our approach is based on the
common information approach of Nayyar \emph{et al}, 2013 and exploits the
linearity of the system dynamics and control strategies. To illustrate our
methodology, we identify sufficient statistics for linear strategies in
decentralized systems where controllers communicate over a strongly connected
graph with finite delays, and for decentralized systems consisting of coupled
subsystems with control sharing or one-sided one step delay sharing information
structures
Decentralized stochastic control
Decentralized stochastic control refers to the multi-stage optimization of a
dynamical system by multiple controllers that have access to different
information. Decentralization of information gives rise to new conceptual
challenges that require new solution approaches. In this expository paper, we
use the notion of an \emph{information-state} to explain the two commonly used
solution approaches to decentralized control: the person-by-person approach and
the common-information approach
Opportunistic capacity and error exponent regions for compound channel with feedback
Variable length communication over a compound channel with feedback is
considered. Traditionally, capacity of a compound channel without feedback is
defined as the maximum rate that is determined before the start of
communication such that communication is reliable. This traditional definition
is pessimistic. In the presence of feedback, an opportunistic definition is
given. Capacity is defined as the maximum rate that is determined at the end of
communication such that communication is reliable. Thus, the transmission rate
can adapt to the channel chosen by nature. Under this definition, feedback
communication over a compound channel is conceptually similar to multi-terminal
communication. Transmission rate is a vector rather than a scalar; channel
capacity is a region rather than a scalar; error exponent is a region rather
than a scalar. In this paper, variable length communication over a compound
channel with feedback is formulated, its opportunistic capacity region is
characterized, and lower bounds for its error exponent region are provided.
Team Optimal Decentralized State Estimation of Linear Stochastic Processes by Agents with Non-Classical Information Structures
We consider the problem of team optimal decentralized estimation of a linear
stochastic process by multiple agents. Each agent receives a noisy observation
of the state of the process and delayed observations of its neighbors
(according to a pre-specified, strongly connected, communication graph). Based
on their observations, all agents generate a sequence of estimates of the state
of the process. The objective is to minimize the total expected weighted mean
square error between the state and the agents' estimates over a finite horizon.
In centralized estimation with weighted mean square error criteria, the optimal
estimator does not depend on the weight matrix in the cost function. We show
that this is not the case when the information is decentralized. The team
optimal decentralized estimates depend on the weight matrix in the cost
function. In particular, we show that the optimal estimate consists of two
parts: a common estimate which is the conditional mean of the state given the
common information and a correction term which is a linear function of the
offset of the local information from the conditional expectation of the local
information given the common information. The corresponding gain depends on the
weight matrix as well as on the covariance between the offset of agents' local
information from the conditional mean of the local information given the common
information. We show that the local and common estimates can be computed from a
single Kalman filter and derive recursive expressions for computing the offset
covariances and the estimation gains.Comment: 16 pages, 6 figures, Submitted to Automatica second versio
Renewal Monte Carlo: Renewal theory based reinforcement learning
In this paper, we present an online reinforcement learning algorithm, called
Renewal Monte Carlo (RMC), for infinite horizon Markov decision processes with
a designated start state. RMC is a Monte Carlo algorithm and retains the
advantages of Monte Carlo methods including low bias, simplicity, and ease of
implementation while, at the same time, circumvents their key drawbacks of high
variance and delayed (end of episode) updates. The key ideas behind RMC are as
follows. First, under any reasonable policy, the reward process is ergodic. So,
by renewal theory, the performance of a policy is equal to the ratio of
expected discounted reward to the expected discounted time over a regenerative
cycle. Second, by carefully examining the expression for performance gradient,
we propose a stochastic approximation algorithm that only requires estimates of
the expected discounted reward and discounted time over a regenerative cycle
and their gradients. We propose two unbiased estimators for evaluating
performance gradients---a likelihood ratio based estimator and a simultaneous
perturbation based estimator---and show that for both estimators, RMC converges
to a locally optimal policy. We generalize the RMC algorithm to post-decision
state models and also present a variant that converges faster to an
approximately optimal policy. We conclude by presenting numerical experiments
on a randomly generated MDP, event-triggered communication, and inventory
management.Comment: 9 pages, 5 figure
Distortion-transmission trade-off in real-time transmission of Markov sources
The problem of optimal real-time transmission of a Markov source under
constraints on the expected number of transmissions is considered, both for the
discounted and long term average cases. This setup is motivated by applications
where transmission is sporadic and the cost of switching on the radio and
transmitting is significantly more important than the size of the transmitted
data packet. For this model, we characterize the distortion-transmission
function, i.e., the minimum expected distortion that can be achieved when the
expected number of transmissions is less than or equal to a particular value.
In particular, we show that the distortion-transmission function is a piecewise
linear, convex, and decreasing function. We also give an explicit
characterization of each vertex of the piecewise linear function.
To prove the results, the optimization problem is cast as a decentralized
constrained stochastic control problem. We first consider the Lagrange
relaxation of the constrained problem and identify the structure of optimal
transmission and estimation strategies. In particular, we show that the optimal
transmission is of a threshold type. Using these structural results, we obtain
dynamic programs for the Lagrange relaxations. We identify the performance of
an arbitrary threshold-type transmission strategy and use the idea of
calibration from multi-armed bandits to determine the optimal transmission
strategy for the Lagrange relaxation. Finally, we show that the optimal
strategy for the constrained setup is a randomized strategy that randomizes
between two deterministic strategies that differ only at one state. By
evaluating the performance of these strategies, we determine the shape of the
distortion-transmission function. These results are illustrated using an
example of transmitting a birth-death Markov source
Optimal Performance of Feedback Control Systems with Limited Communication over Noisy Channels
A discrete time stochastic feedback control system with a noisy communication
channel between the sensor and the controller is considered. The sensor has
limited memory. At each time, the sensor transmits encoded symbol over the
channel and updates its memory. The controller receives a noisy version of the
transmitted symbol, and generates a control action based on all its past
observations and actions. This control action action is fed back into the
system. At each stage the system incurs an instantaneous cost depending on the
state of the plant and the control action. The objective is to choose encoding,
memory updating and control strategies to minimize the expected total costs
over a finite horizon, or the expected discounted cost over an infinite
horizon, or the expected average cost per unit time over an infinite horizon.
For each case we obtain a sequential decomposition of the optimization problem.
The results are extended to the case when the sensor makes an imperfect
observation of the state of the system.Comment: Preprint of paper to appear in CDC 2006. 8 pages, 2 figure
Sufficient conditions for the value function and optimal strategy to be even and quasi-convex
Sufficient conditions are identified under which the value function and the
optimal strategy of a Markov decision process (MDP) are even and quasi-convex
in the state. The key idea behind these conditions is the following. First,
sufficient conditions for the value function and optimal strategy to be even
are identified. Next, it is shown that if the value function and optimal
strategy are even, then one can construct a "folded MDP" defined only on the
non-negative values of the state space. Then, the standard sufficient
conditions for the value function and optimal strategy to be monotone are
"unfolded" to identify sufficient conditions for the value function and the
optimal strategy to be quasi-convex. The results are illustrated by using an
example of power allocation in remote estimation.Comment: 8 page
- …